Improve resolver speed #14663

x-hgg-x · 2024-10-10T14:47:17Z

What does this PR try to resolve?

This PR improves the resolver speed after investigations from Eh2406/pubgrub-crates-benchmark#6.

How should we test and review this PR?

Commit 1 adds a test showing a slow case in the resolver, where resolving time is cubic over the number of versions of a crate. It can be used for benchmarking the resolver.

Comparison of the resolving time with various values of N=LAST_CRATE_VERSION_COUNT and C=TRANSITIVE_CRATES_COUNT:

	N=100	N=200	N=400
C=1	0.25s	0.5s	1.4s
C=2	7s	44s	314s
C=3	12s	77s	537s
C=6	30s	149s	1107s
C=12	99s	447s	2393s

Commit 2 replaces the default hasher with the hasher from rustc-hash, decreasing resolving time by more than 50%.

Performance comparison with the test from commit 1 by setting LAST_CRATE_VERSION_COUNT = 100:

commit	duration
master	16s
with rustc-hash	6.7s

Firefox profiles, can be read with https://profiler.firefox.com:
perf.tar.gz

r? Eh2406

src/cargo/core/package_id.rs

Eh2406

Reviewing one commit at a time. Some small nits about the test.

crates/resolver-tests/tests/resolve.rs

src/cargo/core/resolver/conflict_cache.rs

Eh2406

Absolutely love the switch to rustc-hash, from my testing this is a huge speed up with no downside.

Still reviewing the other commits.

crates/resolver-tests/tests/resolve.rs

src/cargo/core/package_id.rs

src/cargo/core/resolver/activation_key.rs

src/cargo/core/resolver/context.rs

epage · 2024-10-10T16:24:10Z

Absolutely love the switch to rustc-hash, Eh2406/pubgrub-crates-benchmark#6 (comment) this is a huge speed up with no downside.

Should we split the test and rustc-hash commits out into its own PR and get that merged while we work through nohash?

src/cargo/core/resolver/activation_key.rs

src/cargo/core/resolver/context.rs

x-hgg-x · 2024-10-10T17:19:17Z

I splitted the nohash commits in #14665.

Eh2406 · 2024-10-10T17:23:00Z

@bors r+

bors · 2024-10-10T17:23:04Z

📌 Commit 93db5bf has been approved by Eh2406

It is now in the queue for this repository.

bors · 2024-10-10T17:24:12Z

⌛ Testing commit 93db5bf with merge 6d679d3...

Improve resolver speed ### What does this PR try to resolve? This PR improves the resolver speed after investigations from Eh2406/pubgrub-crates-benchmark#6. ### How should we test and review this PR? Commit 1 adds a test showing a slow case in the resolver, where resolving time is cubic over the number of versions of a crate. It can be used for benchmarking the resolver. Comparison of the resolving time with various values of `N=LAST_CRATE_VERSION_COUNT` and `C=TRANSITIVE_CRATES_COUNT`: | | N=100 | N=200 | N=400 | |------|-------|-------|-------| | C=1 | 0.25s | 0.5s | 1.4s | | C=2 | 7s | 44s | 314s | | C=3 | 12s | 77s | 537s | | C=6 | 30s | 149s | 1107s | | C=12 | 99s | 447s | 2393s | Commit 2 replaces the default hasher with the hasher from `rustc-hash`, decreasing resolving time by more than 50%. Performance comparison with the test from commit 1 by setting `LAST_CRATE_VERSION_COUNT = 100`: | commit | duration | |-----------------|----------| | master | 16s | | with rustc-hash | 6.7s | Firefox profiles, can be read with https://profiler.firefox.com: [perf.tar.gz](https://github.com/user-attachments/files/17318243/perf.tar.gz) r? Eh2406

weihanglo · 2024-10-10T17:36:26Z

While it doesn't seem like this part of code is under the risk of hash collision, there is a concern around the other part in Cargo, see #13171 (comment).

I also did some exploration of blake3 as a stable hash algorithm candidate, and blake3 has become a dependency since #14137. Have we benchmarked blake3 as well?

Eh2406 · 2024-10-10T17:40:49Z

All of these hashes are being used exclusively in memory. So the concerns about consistency between builds let alone consistency between architectures are not relevant here. Happy to see experimentation with other algorithms though.

bors · 2024-10-10T17:43:54Z

💔 Test failed - checks-actions

x-hgg-x · 2024-10-10T18:06:50Z

I rebased on master.

weihanglo · 2024-10-10T18:07:01Z

Cargo.lock

@Eh2406 I moved conversion here for us to follow easier.

All of these hashes are being used exclusively in memory. So the concerns about consistency between builds let alone consistency between architectures are not relevant here.

Despite that, the lang team has recently decided to migrate away from weak hash algorithms. I assume that type_id is also mostly an in-memory stuff. Do we need to follow suit?

Collisions in type_id rust#10389 (comment)

type_id is not sufficiently collision-resistant rust#129014

How easy is it for an attacker to fake a PackageId, and how bad is it when it really happens?

The problem with type_id is that it assumes that if the hash is equal than the object is equal. In its case, it even goes so far as to do an unsafe transmute based on that assumption.

In this PRs case we only use it for a HashMap. If two things spuriously have the same hash than they will be checked by eq. If an attacker could manage to get a lot of things to hash to the same thing then cargo would spend O(n) time finding the right one instead of O(1) amortized.

How easy is it for an attacker to fake a PackageId, and how bad is it when it really happens?

PackageId's are only generated for dependencies. If a dependency is controlled by the attacker, then serious hash collisions are the least of our problems.

Even SipHash is not an choice from t-lang's perspective in that linked issue, which is what Cargo is using now from std. As you said, we have eq so should be even harder to exploit Cargo.

If an attacker could manage to get a lot of things to hash to the same thing then cargo would spend O(n) time finding the right one instead of O(1) amortized.

This might be a problem for companies using either cargo binary or library as a service. I am not sure if we need to take this into account. Without considering this I am 👍🏾 on this PR.

Eh2406 · 2024-10-10T19:54:25Z

@bors r+

bors · 2024-10-10T19:54:28Z

📌 Commit 3d4d48b has been approved by Eh2406

It is now in the queue for this repository.

bors · 2024-10-10T19:55:36Z

⌛ Testing commit 3d4d48b with merge 7aa7fb1...

bors · 2024-10-10T20:24:52Z

☀️ Test successful - checks-actions
Approved by: Eh2406
Pushing 7aa7fb1 to master...

rustbot assigned Eh2406 Oct 10, 2024

rustbot added A-dependency-resolution Area: dependency resolution and the resolver S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 10, 2024

epage reviewed Oct 10, 2024

View reviewed changes

src/cargo/core/package_id.rs Outdated Show resolved Hide resolved

x-hgg-x force-pushed the resolver-perf branch 2 times, most recently from 40f2907 to 6e57306 Compare October 10, 2024 15:20

Eh2406 approved these changes Oct 10, 2024

View reviewed changes

crates/resolver-tests/tests/resolve.rs Outdated Show resolved Hide resolved

crates/resolver-tests/tests/resolve.rs Outdated Show resolved Hide resolved

crates/resolver-tests/tests/resolve.rs Outdated Show resolved Hide resolved